{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "4f0cfeed",
   "metadata": {},
   "source": [
    "# Deploying an MCP STDIO Server as a scalable HTTP service with Ray Serve\n",
    "\n",
    "Deploying an existing MCP as a HTTP Service with Ray Serve, as shown in the tutorial, can make your service more reliable and easier to scale. This approach is beneficial for the following reasons:\n",
    "\n",
    "## Addressing MCP stdio Mode limitations\n",
    "[MCP in stdio mode](https://modelcontextprotocol.io/docs/concepts/transports#standard-input%2Foutput-stdio), which uses standard input/output streams, is typically run locally for command-line tools or simple integrations. This makes it difficult to deploy as a service because it relies on local process communication, which isn't suitable for distributed or cloud environments.\n",
    "\n",
    "Many of the official Docker images on the “shelf” default to stdio mode, making them incompatible with remote servers and large-scale deployments. By using Ray Serve, you can expose any stdio-based MCP server as an HTTP service without modifying or rebuilding your existing Docker images. This approach delivers several key benefits:\n",
    "\n",
    "* **No code changes or image rebuilds**: You don’t have to rewrite your MCP server or rebuild its Docker images—Ray Serve wraps the existing container and handles the transport layer for you.\n",
    "\n",
    "* **Automatic tool discovery**: Retrieve a list of available tools via a simple HTTP GET to the /tools endpoint—no custom scripting required.\n",
    "\n",
    "* **Standardized HTTP API**: Invoke any tool by POSTing to the /call endpoint, passing the tool name and parameters in JSON.\n",
    "\n",
    "* **Cloud-native scalability**: Deploy behind load balancers, autoscale horizontally, and integrate with service meshes or API gateways as you would with any other HTTP microservice.\n",
    "\n",
    "By translating stdio-mode MCP servers into HTTP endpoints with Ray Serve, you gain the flexibility and reliability needed for production-grade deployments—without touching your existing codebase. The following architecture diagram illustrates deploying a MCP Docker image with Ray Serve:\n",
    "\n",
    "<img\n",
    "  src=\"https://agent-and-mcp.s3.us-east-2.amazonaws.com/mcp/diagrams/single_mcp_docker_ray_serve.png\"\n",
    "  alt=\"Deploy a Single MCP Docker Image with Ray Serve Architecture\"\n",
    "  style=\"width:30%; display: block; margin: 0 auto;\"\n",
    "/>\n",
    "\n",
    "\n",
    "## Benefits of Ray Serve deployment on Anyscale\n",
    "Converting MCP to a HTTP service using Ray Serve, as shown in the tutorial, addresses the deployment challenges of stdio mode. It makes the service easier to manage and deploy, especially in production, with additional features:\n",
    "\n",
    "**Ray Serve capabilities:**\n",
    "* **Autoscaling**: Ray Serve automatically adjusts the number of replicas based on traffic demand, ensuring your service handles increased load while maintaining responsiveness during peak usage periods.\n",
    "* **Load balancing**: Ray Serve intelligently distributes incoming requests across available replicas, preventing any single instance from becoming overwhelmed and maintaining consistent performance.\n",
    "* **Observability**: Built-in monitoring capabilities provide visibility into your service's performance, including request metrics, resource utilization, and system health indicators.\n",
    "* **Fault tolerance**: Ray Serve automatically detects and recovers from failures by restarting failed components and redistributing requests to healthy replicas, ensuring continuous service availability.\n",
    "\n",
    "**Anyscale service additional benefits:**\n",
    "* **Production ready**: Anyscale provides enterprise-grade infrastructure management and automated deployments that make your MCP service ready for real-world production traffic.\n",
    "* **[High availability](https://docs.anyscale.com/platform/services/faq#does-services-support-multiple-availability-zones-for-high-availability)**: Advanced availability zone aware scheduling mechanisms and zero-downtime rolling updates to ensure your service maintains high availability.\n",
    "* **[Logging](https://docs.anyscale.com/monitoring/accessing-logs) and [Tracing](https://docs.anyscale.com/monitoring/tracing)**: Enhanced observability with comprehensive logging, distributed tracing, and real-time monitoring dashboards that provide deep insights into request flows and system performance.\n",
    "* **[Head node fault tolerance](https://docs.anyscale.com/platform/services/head-node-ft/)**: Additional resilience through managed head node redundancy, protecting against single points of failure in your Ray cluster's coordination layer.\n",
    "* **Composition**: Build complex services by orchestrating multiple deployments into a single pipeline, allowing you to chain preprocessing, model inference, postprocessing, and custom logic seamlessly.\n",
    "\n",
    "\n",
    "**Note**:\n",
    "* If you want to use **off-the-shelf MCP Docker images** to deploy a scalable MCP service, this tutorial still works. However, with this approach you need to build some custom code in your agent to list and call the tools properly. \n",
    "* For **deeper integrations with Ray Serve using your own custom MCP tools**, you can also use MCP in Streamable HTTP mode with Ray Serve. See Notebook #1 and #2 for that approach. This allows you directly [integrate Claude with remote MCP servers](https://support.anthropic.com/en/articles/11175166-about-custom-integrations-using-remote-mcp). \n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7955c87b",
   "metadata": {},
   "source": [
    "## Prerequisites\n",
    "- Ray [Serve], already included in the base Docker image\n",
    "- Podman\n",
    "- A Brave API key set in your environment (`BRAVE_API_KEY`)\n",
    "- MCP Python library \n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3869b666",
   "metadata": {},
   "source": [
    "### Dependencies\n",
    "\n",
    "**Build Docker image for Ray Serve deployment**\n",
    "\n",
    "In this tutorial you need to [build a Docker image for deployment on Anyscale](https://docs.anyscale.com/configuration/dependency-management/dependency-byod/) using the [Dockerfile included in this code repo](./Dockerfile). \n",
    "\n",
    "The reason is that when you run `apt-get install -y podman` (e.g. installing a system package) from the workspace terminal, it only lives in the Ray head node and is not propagated to your Ray worker nodes. \n",
    "\n",
    "After building the Docker image, navigate to the **Dependencies** tab in Workspaces and select the corresponding image you just created, and set the **BRAVE_API_KEY** environment variable.\n",
    "\n",
    "**Note**\n",
    " This Docker image is provided solely to deploy the MCP with Ray Serve. Ensure that your MCP docker images, like `docker.io/mcp/brave-search`, are already published to your own private registry or public registry. \n",
    "\n",
    "### Common issues\n",
    "\n",
    "1. **FileNotFoundError: [Errno 2] No such file or directory**\n",
    "- Usually indicates Podman isn't installed correctly. Verify the Podman installation.\n",
    "\n",
    "2. **KeyError: 'BRAVE_API_KEY'**\n",
    "- Ensure you have exported BRAVE_API_KEY in your environment or included it in your dependency configuration."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "46eedb56",
   "metadata": {},
   "source": [
    "## 1. Create the deployment file\n",
    "Save the following code as `brave_mcp_ray_serve.py`. This script defines a Ray Serve deployment that proxies requests to the MCP Brave Search server with Podman:\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "a0bca811",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "import os\n",
    "import asyncio\n",
    "import logging\n",
    "from contextlib import AsyncExitStack\n",
    "from typing import Any, Dict, List\n",
    "\n",
    "from fastapi import FastAPI, Request, HTTPException\n",
    "from ray import serve\n",
    "\n",
    "from mcp import ClientSession, StdioServerParameters\n",
    "from mcp.client.stdio import stdio_client\n",
    "\n",
    "app = FastAPI()\n",
    "logger = logging.getLogger(\"MCPDeployment\")\n",
    "\n",
    "\n",
    "@serve.deployment(num_replicas=3, ray_actor_options={\"num_cpus\": 0.5})\n",
    "@serve.ingress(app)\n",
    "class BraveSearchDeployment:\n",
    "    \"\"\"MCP deployment that exposes every tool provided by its server.\n",
    "\n",
    "    * **GET  /tools** - list tools (name, description, and input schema)\n",
    "    * **POST /call** - invoke a tool\n",
    "\n",
    "      ```json\n",
    "      {\n",
    "        \"tool_name\": \"<name>\",   // optional - defaults to brave_web_search\n",
    "        \"tool_args\": { ... }      // **required** - arguments for the tool\n",
    "      }\n",
    "      ```\n",
    "    \"\"\"\n",
    "\n",
    "    DEFAULT_TOOL = \"brave_web_search\"\n",
    "\n",
    "    def __init__(self) -> None:\n",
    "        self._init_task = asyncio.create_task(self._initialize())\n",
    "\n",
    "    # ------------------------------------------------------------------ #\n",
    "    # 1. Start podman + MCP session\n",
    "    # ------------------------------------------------------------------ #\n",
    "    async def _initialize(self) -> None:\n",
    "        params = StdioServerParameters(\n",
    "            command=\"podman\",\n",
    "            args=[\n",
    "                \"run\",\n",
    "                \"-i\",\n",
    "                \"--rm\",\n",
    "                \"-e\",\n",
    "                f\"BRAVE_API_KEY={os.environ['BRAVE_API_KEY']}\",\n",
    "                \"docker.io/mcp/brave-search\",\n",
    "            ],\n",
    "            env=os.environ.copy(),\n",
    "        )\n",
    "\n",
    "        self._exit_stack = AsyncExitStack()\n",
    "\n",
    "        stdin, stdout = await self._exit_stack.enter_async_context(stdio_client(params))\n",
    "\n",
    "        self.session: ClientSession = await self._exit_stack.enter_async_context(ClientSession(stdin, stdout))\n",
    "        await self.session.initialize()\n",
    "\n",
    "        logger.info(\"BraveSearchDeployment replica ready.\")\n",
    "\n",
    "    async def _ensure_ready(self) -> None:\n",
    "        \"\"\"Block until _initialize finishes (and surface its errors).\"\"\"\n",
    "        await self._init_task\n",
    "\n",
    "    # ------------------------------------------------------------------ #\n",
    "    # 2. Internal helper: list tools\n",
    "    # ------------------------------------------------------------------ #\n",
    "    async def _list_tools(self) -> List[Dict[str, Any]]:\n",
    "        await self._ensure_ready()\n",
    "        resp = await self.session.list_tools()\n",
    "        return [\n",
    "            {\n",
    "                \"name\": tool.name,\n",
    "                \"description\": tool.description,\n",
    "                \"input_schema\": tool.inputSchema,\n",
    "            }\n",
    "            for tool in resp.tools\n",
    "        ]\n",
    "\n",
    "    # ------------------------------------------------------------------ #\n",
    "    # 3. HTTP endpoints\n",
    "    # ------------------------------------------------------------------ #\n",
    "    @app.get(\"/tools\")\n",
    "    async def tools(self):\n",
    "        \"\"\"Return all tools exposed by the backing MCP server.\"\"\"\n",
    "        return {\"tools\": await self._list_tools()}\n",
    "\n",
    "    @app.post(\"/call\")\n",
    "    async def call_tool(self, request: Request):\n",
    "        \"\"\"Generic endpoint to invoke any tool exposed by the server.\"\"\"\n",
    "        body = await request.json()\n",
    "\n",
    "        tool_name: str = body.get(\"tool_name\", self.DEFAULT_TOOL)\n",
    "        tool_args: Dict[str, Any] | None = body.get(\"tool_args\")\n",
    "\n",
    "        if tool_args is None:\n",
    "            raise HTTPException(400, \"must include 'tool_args'\")\n",
    "\n",
    "        await self._ensure_ready()\n",
    "\n",
    "        try:\n",
    "            result = await self.session.call_tool(tool_name, tool_args)\n",
    "            return {\"result\": result}\n",
    "        except Exception as exc:\n",
    "            logger.exception(\"MCP tool call failed\")\n",
    "            raise HTTPException(500, \"Tool execution error\") from exc\n",
    "\n",
    "    # ------------------------------------------------------------------ #\n",
    "    # 4. Tidy shutdown\n",
    "    # ------------------------------------------------------------------ #\n",
    "    async def __del__(self):\n",
    "        if hasattr(self, \"_exit_stack\"):\n",
    "            await self._exit_stack.aclose()\n",
    "\n",
    "\n",
    "# Entry-point object for `serve run …`\n",
    "brave_search_tool = BraveSearchDeployment.bind()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "68ca040d",
   "metadata": {},
   "source": [
    "**Note:**\n",
    "\n",
    "* In the Ray cluster, use **Podman** instead of Docker to run and manage containers. This approach aligns with the guidelines provided in the [Ray Serve multi-app container deployment documentation](https://docs.ray.io/en/latest/serve/advanced-guides/multi-app-container.html).\n",
    "\n",
    "* Additionally, for images such as `\"docker.io/mcp/brave-search\"`, explicitly include the **`\"docker.io/\"`** prefix to ensure Podman correctly identifies the image URI.\n",
    "\n",
    "* Set the `@serve.deployment(num_replicas=3, ray_actor_options={\"num_cpus\": 0.5})` as an example. For more details to configure Ray Serve deployments, see https://docs.ray.io/en/latest/serve/configure-serve-deployment.html."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dad291df",
   "metadata": {},
   "source": [
    "## 2. Run the service with Ray Serve in the workspace\n",
    "\n",
    "You can run the following command in the terminal to deploy the service using Ray Serve:\n",
    "\n",
    "```bash\n",
    "serve run brave_mcp_ray_serve:brave_search_tool\n",
    "```\n",
    "This starts the service on `http://localhost:8000`."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ef3e5f84",
   "metadata": {},
   "source": [
    "## 3. Test the service\n",
    "**List available tools**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b9c0f784",
   "metadata": {},
   "outputs": [],
   "source": [
    "import httpx, asyncio\n",
    "from pprint import pprint\n",
    "import requests\n",
    "\n",
    "BASE_URL = \"http://localhost:8000\"\n",
    "\n",
    "response = requests.get(f\"{BASE_URL}/tools\", timeout=10)\n",
    "response.raise_for_status()\n",
    "tools = response.json()\n",
    "pprint(tools)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d4c5b717",
   "metadata": {},
   "source": [
    "**Invoke the Brave Web Search tool:**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1b0d6768",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Invoke the brave_web_search tool\n",
    "query = \"best tacos in Los Angeles\"\n",
    "payload = {\"tool_name\": \"brave_web_search\", \"tool_args\": {\"query\": query}}\n",
    "resp = requests.post(f\"{BASE_URL}/call\", json=payload)\n",
    "print(f\"\\n\\nQuery:{query}\")\n",
    "print(\"\\n\\nResults:\\n\\n\")\n",
    "pprint(resp.json())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "24abf463",
   "metadata": {},
   "source": [
    "## 4.  Production deployment with Anyscale service\n",
    "\n",
    "For production deployment, use Anyscale Services to deploy the Ray Serve app to a dedicated cluster without modifying the code. Anyscale ensures scalability, fault tolerance, and load balancing, keeping the service resilient against node failures, high traffic, and rolling updates.\n",
    "\n",
    "Use the following command to deploy the service:\n",
    "\n",
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7389f9a1",
   "metadata": {},
   "source": [
    "```bash\n",
    "anyscale service deploy brave_mcp_ray_serve:brave_search_tool --name=brave_search_tool_service\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "689d700f",
   "metadata": {},
   "source": [
    "**Note:**\n",
    " \n",
    "This Anyscale service pulls the associated dependencies, compute config, and service config from the workspace. To define these explicitly, you can deploy from a config.yaml file using the -f flag. See [ServiceConfig reference](https://docs.anyscale.com/reference/service-api/#serviceconfig) for details."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "25755fc6",
   "metadata": {},
   "source": [
    "## 5. Query the production service\n",
    "\n",
    "When you deploy, you expose the service to a publicly accessible IP address, which you can send requests to.\n",
    "\n",
    "In the preceding cell’s output, copy your API_KEY and BASE_URL. As an example, the values look like the following:\n",
    "\n",
    "* BASE_URL: https://brave-search-tool-service-jgz99.cld-kvedzwag2qa8i5bj.s.anyscaleuserdata.com\n",
    "* TOKEN: yW2n0QPjUyUfyS6W6rIRIoEfFr80-JjXmnoEQGbTe7E\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "Fill in the following placeholder values for the BASE_URL and API_KEY in the following Python requests object:\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "08d679c7",
   "metadata": {},
   "source": [
    "```python\n",
    "import httpx\n",
    "import asyncio\n",
    "from pprint import pprint\n",
    "import requests\n",
    "\n",
    "# Service specific config.\n",
    "BASE_URL = \"https://brave-search-tool-service-jgz99.cld-kvedzwag2qa8i5bj.s.anyscaleuserdata.com\" # Replace with your own URL\n",
    "TOKEN = \"yW2n0QPjUyUfyS6W6rIRIoEfFr80-JjXmnoEQGbTe7E\" # Replace with your own token\n",
    "\n",
    "# Prepare the auth header.\n",
    "HEADERS = {\n",
    "    \"Authorization\": f\"Bearer {TOKEN}\"\n",
    "}\n",
    "\n",
    "# List tools.\n",
    "resp = requests.get(f\"{BASE_URL}/tools\", headers=HEADERS)\n",
    "resp.raise_for_status()\n",
    "print(\"Tools:\\n\\n\")\n",
    "pprint(resp.json())\n",
    "\n",
    "# Invoke search.\n",
    "query = \"best tacos in Los Angeles\"\n",
    "payload = {\"tool_name\": \"brave_web_search\", \"tool_args\": {\"query\": query}}\n",
    "resp = requests.post(f\"{BASE_URL}/call\", json=payload, headers=HEADERS)\n",
    "print(f\"\\n\\nQuery:{query}\")\n",
    "print(\"\\n\\nResults:\\n\\n\")\n",
    "pprint(resp.json())\n",
    "```"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "base",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
